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Abstract. We propose a method for automatic local time-adaptation 
of the spectrogram of audio signals: it is based on the decomposition of 
a signal within a Gabor multi-frame through the STFT operator. The 
sparsity of the analysis in every individual frame of the multi-frame is 
evaluated through the Renyi entropy measures: the best local resolution 
is determined minimizing the entropy values. The overall spectrogram 
of the signal we obtain thus provides local optimal resolution adaptively 
evolving over time. We give examples of the performance of our algorithm 
with an instrumental sound and a synthetic one, showing the improve- 
ment in spectrogram displaying obtained with an automatic adaptation 
of the resolution. The analysis operator is invertible, thus leading to a 
perfect reconstruction of the original signal through the analysis coeffi- 
cients. 

Keywords: adaptive spectrogram, sound representation, sound analy- 
sis, sound synthesis, Renyi entropy, sparsity measures, frame theory 



1 Introduction 

Far from being restricted to entertainment, sound processing techniques are re- 
quired in many different domains: they find applications in medical sciences, 
security instruments, communications among others. The most challenging class 
of signals to consider is indeed music: the completely new perspective opened 
by contemporary music, assigning a fundamental role to concepts as noise and 
timbre, gives musical potential to every sound. 

The standard techniques of digital analysis are based on the decomposition 
of the signal in a system of elementary functions, and the choice of a specific 
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system necessarily has an influence on the result. Traditional methods based on 
single sets of atomic functions have important limits: a Gabor frame imposes a 
fixed resolution over all the time-frequency plane, while a wavelet frame gives a 
strictly determined variation of the resolution: moreover, the user is frequently 
asked to define himself the analysis window features, which in general is not a 
simple task even for experienced users. This motivates the search for adaptive 
methods of sound analysis and synthesis, and for algorithms whose parameters 
are designed to change according to the analyzed signal features. Our research 
is focused on the development of mathematical models and tools based on the 
local automatic adaptation of the system of functions used for the decomposition 
of the signal: we are interested in a complete framework for analysis, spectral 
transformation and re-synthesis; thus we need to define an efficient strategy to 
reconstruct the signal through the adapted decomposition, which must give a 
perfect recovery of the input if no transformation is applied. 

Here we propose a method for local automatic time-adaptation of the Short 
Time Fourier Transform window function, through a minimization of the Renyi 
entropy [22] of the spectrogram; wc then define a re-synthesis technique with 
an extension of the method proposed in |12j . Our approach can be presented 
schematically in three parts: 

1. a model for signal analysis exploiting concepts of Harmonic Analysis, and 
Frame Theory in particular: it is a generally highly redundant decomposing 
system belonging to the class of multiple Gabor frames [5], [15]: 

2. a sparsity measure defined on time-frequency localized subsets of the analy- 
sis coefficients, in order to determine local optimal concentration; 

3. a reduced representation obtained from the original analysis using the in- 
formation about optimal concentration, and a synthesis method through an 
expansion in the reduced system obtained. 

We have realized a first implementation of this scheme in two different ver- 
sions: for both of them a sparsity measure is applied on subsets of analysis co- 
efficients covering the whole frequency dimension, thus defining a time-adapted 
analysis of the signal. The main difference between the two concerns the first 
part of the model, that is the single frames composing the multiple Gabor frame. 
This is a key point as the first and third part of the scheme are strictly linked: 
the frame used for re-synthesis is a reduction of the original multi-frame, so the 
entire model depends on how the analysis multi-frame is designed. The section 
Frame Theory in Sound Analysis and Synthesis treats this part of our research 
in more details. 

The second point of the scheme is related to the measure applied on the coef- 
ficients of the analysis within the multi-frame to determine local best resolutions. 
We consider measures borrowed from Information Theory and Probability The- 
ory according to the interpretation of the analysis within a frame as a probability 



A Method for Local Time-adaptation of the Spectrogram 3 

density [5]: our model is based on a class of entropy measures known as Renyi 
entropies which extend the classical Shannon entropy. The fundamental idea is 
that minimizing the complexity or information over a set of time-frequency rep- 
resentations of the same signal is equivalent to maximizing the concentration and 
peakiness of the analysis, thus selecting the best resolution tradeoff [T]: in the 
section Renyi Entropy of Spectrograms we describe how a sparsity measure can 
consequently be defined through an information measure. Finally, in the fourth 
section we provide a description of our algorithm and examples of adapted spec- 
trogram for different sounds. 

Some examples of this approach can be found in the literature: the idea of 
gathering a sparsity measure from Renyi entropies is detailed in pQ, and in |14) 
a local time-frequency adaptive framework is presented exploiting this concept, 
even if no methods for perfect reconstruction are provided. In [21] sparsity is 
obtained through a regression model; a recent development in this sense is con- 
tained in [15] where a class of methods for analysis adaptation are obtained sepa- 
rately in the time and frequency dimension together with perfect reconstruction 
formulas: indeed no strategies for automatization are employed, and adaptation 
has to be managed by the user. The model conceived in Q]5] belongs to this 
same class but presents several novelties in the construction of the Gabor multi- 
frame and in the method for automatic local time-adaptation. In |16) another 
time-frequency adaptive spectrogram is defined considering a sparsity measure 
called energy smearing, without taking into account the re-synthesis task. The 
concept of quilted frame, recently introduced in [10] , is the first promising effort 
to establish a unified mathematical model for all the various frameworks cited 
above. 



2 Frame Theory in Sound Analysis and Synthesis 

When analyzing a signal through its decomposition, the features of the repre- 
sentation are influenced by the decomposing functions; the Frame Theory (see 
[3], [13] for detailed mathematical descriptions) allows a unified approach when 
dealing with different bases and systems, studying the properties of the operators 
that they identify. The concept of frame extends the one of orthonormal basis in 
a Hilbert space, and it provides a theory for the discretization of time-frequency 
densities and operators [5] , [20] > [2] ■ Both the STFT and the Wavelet transform 
can be interpreted within this setting (see [17] for a comprehensive survey of 
theory and applications). 

Here we summarize the basic definitions and theorems, and outline the funda- 
mental step consisting in the introduction of Multiple Gabor Frames, which is 
comprehensively treated in [3]. The problem of standard frames is that the de- 
composing atoms are defined from the same original function, thus imposing a 
limit on the type of information that one can deduce from the analysis coeffi- 
cients; if we were able to consider frames where several families of atoms coexist, 
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than we would have an analysis with variable information, at the price of a higher 
redundancy. 

2.1 Basic Definitions and Results 

Given a Hilbert space H seen as a vector space on C, with its own scalar product, 
we consider in H a set of vectors {</> 7 } 76 r where the index set _T may be infinite 
and 7 can also be a multi-index. The set {</> 7 } 7 er is a frame for H if there exist 
two positive non zero constants A and B, called frame bounds, such that for all 

/ e H, 

A\\f\\ 2 <Y,\UA- t )\ 2 <B\\.f\\ 2 • (1) 

76-T 

We are interested in the case H = L 2 (R) and r countable, as it represents 
the standard situation where a signal / is decomposed through a countable set 
of given functions {4>k}kei,- The frame bounds A and B are the infimum and 
supremum, respectively, of the eigenvalues of the frame operator U, defined as 

u/ = ]T(/,<M0fc . (2) 

feez 

For any frame {4>k}kez there exist dual frames {<pk}kez such that for all / £ 
L 2 {R) ' 

feez feez 

so that given a frame it is always possible to perfectly reconstruct a signal / 
using the coefficients of its decomposition through the frame. The inverse of the 
frame operator allows the calculation of the canonical dual frame 

k = U"V* (4) 

which guarantees minimal-norm coefficients in the expansion. 

A Gabor frame is obtained by time-shifting and frequency-transposing a win- 
dow function g according to a regular grid. They are particularly interesting in 
the applications as the analysis coefficients are simply given by sampling the 
STFT of / with window g according to the nodes of a specified lattice. Given a 
time step a and a frequency step b we write {u n }nez = an and {£fc}fcez = bk; 
these two sequences generate the nodes of the time-frequency lattice A for the 
frame {g n> k}(n,k)ei. 2 defined as 

g n ,k( t )=9(t-u n )e 2 ^" t ; (5) 

the nodes are the centers of the Heisenberg boxes associated to the windows in 
the frame. The lattice has to satisfy certain conditions for {g n ,k} to be a frame 
[jj], which impose limits on the choice of the time and frequency steps: for certain 
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choices [7] which are often adopted in standard applications, the frame operator 
takes the form of a multiplication, 

U /(*) =( b ~ 1 £ IS(*-M 2 )/(*), (6) 

and the dual frame is easily calculated by means of a straight multiplication of 
the atoms in the original frame. The relation between the steps a, b and the 
frame bounds A, B in this case is clear by ([6]), as the frame condition implies 

< A < b- 1 \g(t-u n )\ 2 < B < oo . (7) 

nSZ 

Thus we see that the frame bounds provide also information on the redundancy 
of the decomposition of the signal within the frame. 



2.2 Multiple Gabor Frames 

In our adaptive framework, we look for a method to achieve an analysis with 
multiple resolutions: thus we need to combine the information coming from the 
decompositions of a signal in several frames of different window functions. Mul- 
tiple Gabor frames have been introduced in [22 to provide the original Gabor 
analysis with flexible multi-resolution techniques: given a set of index LCZ and 
different frames {g l n fc}( n ,fc)ez 2 with I € L, a multiple Gabor frame is obtained 
with a union of the single given frames. The different g l do not necessarily share 
the same type or shape: in our method an original window is modified with a 
finite number of scaling 

•»M - .({) ; (8) 

then all the scaled versions are used to build \L\ different frames which constitute 
the initial multi-frame. 

A Gabor multi-frame has in general a significant redundancy which lowers the 
readability of the analysis. A possible strategy to overcome this limit is proposed 
in |15j where nonstationary Gabor frames are introduced, actually allowing the 
choice of a different window for each time location of a global irregular lattice 
A, or alternatively for each frequency location. This way, the window chosen is a 
function of time or frequency position in the time-frequency space, not both. In 
most applications, for this kind of frame there exist fast FFT based methods for 
the analysis and re-synthesis steps. Referring to the time case, with the abuse 
of notation g n (i) we indicate the window g l centered at a certain time n(l) = u n 
which is a function of the chosen window itself. Thus, a nonstationary Gabor 
frame is given by the set of atoms 



{g n{ i)e 2 ™ blk \{n{l)Ak)£A} , 



(9) 
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where 6/ is the frequency step associated to the window g l and k G Z . If we 
suppose that the windows g l have limited time support and a sufficiently small 
frequency step bi, the frame operator U takes a similar form to the one in (|6j), 

W) = £^k w (*)l 2 /(*) ■ (io) 

n(l) 1 

Here, if N = ^ z ^|ffn(i)( s )| 2 — 1 tnen U is invertible and the set ^ is a frame 
whose dual frame is given by 

fln(i),*(*) = ^5n(0(*)e 27ra,fct • (11) 

Nonstationary Gabor frames belong to the recently introduced class of quilted 
frames |10) : in this kind of decomposing systems the choice of the analysis win- 
dow depends on both the time and the frequency location, causing more dif- 



ficulties for an analytic fast computation of a dual frame as in (111: future 
improvements of our research concern the employment of such a decomposition 
model for automatic local adaptation of the spectrogram resolution both in the 
time and the frequency dimension. 

3 Renyi Entropy of Spectrograms 

We consider the discrete spectrogram of a signal as a sampling of the square of 
its continuous version 



PS/(u,0 = |S/(u,0l s 



f(t)g(t- u)e- 27Il ^dt 



2 



(12) 



where / is a signal, g is a window function and Sf(u, £) is the STFT of / through 
9- 

Such a sampling is obtained according to a regular lattice A a b, considering a 
Gabor frame ([5b, 

PS/[n,fc] = |S/K,a]| 2 • (13) 

With an appropriate normalization both the continuous and discrete spectro- 
gram can be interpreted as probability densities. Thanks to this interpretation, 
some techniques belonging to the domain of Probability and Information The- 
ory can be applied to our problem: in particular, the concept of entropy can be 
extended to give a sparsity measure of a time-frequency density. A promising 
approach [I] takes into account Renyi entropies, a generalization of the Shannon 
entropy: the application to our problem is related to the concept that mini- 
mizing the complexity or information of a set of time-frequency representations 
of a same signal is equivalent to maximizing the concentration, peakiness, and 
therefore the sparsity of the analysis. Thus we will consider as best analysis the 
sparsest one, according to the minimal entropy evaluation. 
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Given a signal / and its spectrogram PS/ as in (12), the Renyi entropy of 
order a > 0, a ^ 1 of PS/ is denned as follows 

„; (PS/) . log2 // s , (14) 

where i? C K 2 and we omit its indication if equality holds. Given a discrete 
spectrogram with time step a and frequency step b as in ( |13| , we consider i? as 
a rectangle of the time-frequency plane R — [ti,t2] x [^1,^2] ^ K 2 - It identifies 
a sequence of points G C yl ab where G = {(n, k) £ Z 2 : t\ < na < i2, v\ < kb < 
vi\. As a discretization of the original continuous spectrogram, every sample in 
PS/[n, k] is related to a time-frequency region of area ab; we thus obtain the 
discrete Renyi entropy measure directly from |l~4|), 



HSPS/H-^lofc £ ( PS/[ ; s fc L fc1 )° + l0S2(ab) - (15) 

We will focus on discretized spectrograms with a finite number of coefficients, 
as dealing with digital signal processing requires to work with finite sampled sig- 
nals and distributions. 



Among the general properties of Renyi entropies [18], [3] and [23] we recall 
in particular those directly related with our problem. It is easy to show that for 
every finite discrete probability density P the entropy H a (P) tends to coincide 
with the Shannon entropy of P as the order a tends to one. Moreover, H Q (P) is 
a non increasing function of a, so 

ax <a 2 ^H Ql (P)>H Q2 (F) . (16) 

As we are working with finite discrete densities we can also consider the case 
a — which is simply the logarithm of the number of elements in P; as a con- 
sequence Ho(-P) > H a (P) for every admissible order a. 

A third basic fact is that for every order a the Renyi entropy H a is maximum 
when P is uniformly distributed, while it is minimum and equal to zero when P 
has a single non-zero value. 



All of these results give useful information on the values of different measures 
on a single density P as in ( 15 ), while the relations between the entropies of two 
different densities P and Q are in general hard to determine analytically; in our 
problem, P and Q are two spectrograms of a signal in the same time-frequency 
area, based on two window functions with different scaling as in Q. In some 
basic cases such a relation is achievable, as shown in the following example. 



3.1 Best Window for Sinusoids 

When the spectrograms of a signal through different window functions do not 
depend on time, it is easy to compare their entropies: let PS S be the sampled 
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spectrogram of a sinusoid s over a finite region G with a window function g of 
compact support; then PS S is simply a translation in the frequency domain of 
g, the Fourier transform of the window, and it is therefore time-independent. 
We choose a bounded set L of admissible scaling factors, so that the discretized 
support of the scaled windows g l still remains inside G for any I £ L. It is not 
hard to prove that the entropy of a spectrogram taken with such a scaled version 
of g is given by 

H«(PS s/ )=HG(PS s )-log 2 Z . (17) 
The sparsity measure we are using chooses as best window the one which mini- 



mizes the entropy measure: we deduce from ( 17) that it is the one obtained with 
the largest scaling factor available, therefore with the largest time-support. This 
is coherent with our expectation as stationary signals, such as sinusoids, are best 
analyzed with a high frequency resolution, because time-independency allows a 
small time resolution. Moreover, this is true for any order a used for the entropy 
calculus. Symmetric considerations apply whenever the spectrogram of a signal 
does not depend on frequency, as for impulses. 



3.2 The a. Parameter 



The a parameter in ( 14 ) introduces a biasing on the spectral coefficients; to 
have a qualitative description of this biasing, we consider a collection of simple 
spectrograms composed by a variable amount of large and small coefficients. We 
realize a vector D of length N — 100 generating numbers between and 1 with 
a normal random distribution; then we consider the vectors Dm, 1 < M < N 
such that 

[D\k\\ik<M 

D Mm = {m i{k - M (is) 

and then normalize to obtain a unitary sum. We then apply Renyi entropy 
measures with a varying between and 30: as we see from figure [l] there is a 
relation between M and the slope of the entropy curves for the different values 
of a. For a = 0, Ho [Dm] is the logarithm of the number of non-zero coefficients 
and it is therefore constant; when a increases, we see that densities with a small 
amount of large coefficients gradually decrease their entropy, faster than the 
almost flat vectors corresponding to larger values of M. This means that by 
increasing a we emphasize the difference between the entropy values of a peaky 
distribution and that of a nearly flat one. The sparsity measure we consider 
select as best analysis the one with minimal entropy, so reducing a rises the 
probability of less peaky distributions to be chosen as sparsest: in principle, this 
is desirable as weaker components of the signal, such as partials, have to be 
taken into account in the sparsity evaluation. But as well, this principle should 
be applied with care as a small coefficient in a spectrogram could be determined 
by a partial as well as by a noise component; choosing an extremely small a, 
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Fig. 1. Renyi entropy evaluations of the Dm vectors with varying a; the distribution 
becomes flatter as M increases. 



the best window chosen could vary without a reliable relation with spectral 
concentration depending on the noise level within the sound. 



3.3 Time and Frequency Steps 



A last remark regards the dependency of (15 1 on the time and frequency step a 
and b used for the discretization of the spectrogram. When considering signals 
as finite vectors, a signal and its Fourier Transform have the same length. There- 
fore in the STFT the window length determines the number frequency points, 
while the sampling rate sets frequency values: the definition of b is thus implicit 
in the window choice. Actually, the FFT algorithm allows to specify a number 
of frequency points larger than the signal length: further frequency values are 
obtained as an interpolation between the original ones by properly adding zero 
values to the signal. If the sampling rate is fixed, this procedure causes a smaller 
b as a consequence of a larger number of frequency points. We have numerically 
verified that such a variation of b has no impact on the entropy calculus, so that 
the FFT size can be set according to implementation needs. 
Regarding the time step a, we are working on the analytical demonstration of 
a largely verified evidence: as long as the decomposing system is a frame the 
entropy measure is invariant to redundancy variation, so the choice of a can be 
ruled by considerations on the invertibility of the decomposing frame without 
losing coherence between the information measure of the different analyses. This 
is a key point, as it states that the sparsity measure obtained allows a total 
independence between the hop sizes of the different analyses: with the imple- 
mentation of proper structures to handle multi-hop STFTs we have obtained a 
more efficient algorithm in comparison with those imposing a fixed hop size, as 
[16] and the first version of the one we have realized. 
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4 Algorithm and Examples 



We now summarize the main operations of the algorithm we have developed 
providing examples of its application. For the calculation of the spectrograms 
we use a Hanning window 

ft(t) = cos 2 (7rt)X[-§,§] , (19) 

with x the indicator function of the specified interval, but it is obviously possible 
to generalize the results thus obtained to the entire class of compactly supported 
window functions. In both the versions of our algorithm we create a multiple Ga- 
bor frame as in ([5]), using as mother functions some scaled version of h, obtained 
as in ^ with a finite set of positive real scaling factors L. 

We consider consecutive segments of the signal, and for each segment we cal- 
culate \L\ spectrograms with the \L\ scaled windows: the length of the analysis 
segment and the overlap between two consecutive segments are given as param- 
eters. 



In the first version of the algorithm the different frames composing the multi- 
frame have the same time step a and frequency step b: this guarantees that for 
each signal segment the different frames have Heisenberg boxes whose centers 
lay on a same lattice on the time-frequency plane, as illustrated in figure [2j To 
guarantee that all the \L\ scaled windows constitute a frame when translated 
and modulated according to this global lattice, the time step a must be set with 
the hop size assigned to the smallest window frame. On the other hand, as the 
FFT of a discrete signal has the same number of points of the signal itself, the 
frequency step b has to be the FFT size of the largest window analysis: for the 
smaller ones, a zero-padding is performed. 



Each signal segment identifies a time-frequency rectangle G for the entropy 
evaluation: the horizontal edge is the time interval of the considered segment, 
while the vertical one is the whole frequency lattice. For each spectrogram, the 
rectangle G defines a subset of coefficients belonging to G itself. The \L\ different 
subsets do not correspond to the same part of signal, as windows have different 
time supports. Therefore, a preliminary weighting of the signal has to be per- 
formed before the calculations of the local spectrograms: this step is necessary to 
balance the influence on the entropy calculus between coefficients which regard 
parts of signal shared or not shared by the different analysis frames. 
After the pre- weighting, we calculate the entropy of every spectrogram as in ( 15 ). 
Having the \L\ entropy values corresponding to the different local spectrograms, 
the sparsest local analysis is defined as the one with minimum Renyi entropy: 
the window associated to the sparsest local analysis is chosen as best window at 
all the time points contained in G. 

The global time adapted analysis of the signal is finally realized by opportunely 
assembling the slices of local sparsest analyses: they are obtained with a further 
spectrogram calculation of the unweighted signal, employing the best windows 
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Fig. 2. An analysis segment: time locations of the Heisenberg boxes associated to the 
multi-frame used in the first version of our algorithm. 



selected at each time point. 

In figure [4] we give an example of an adaptive analysis performed by our first 
algorithm with four Hanning windows of different sizes on a real instrumental 
sound, a B4 note played by a marimba: this sound combines the need for a good 
time resolution at the strike with that of a good frequency resolution on the 
harmonic resonance. This is fully provided by the algorithm, as shown in the 
adaptive spectrogram at the bottom of the figure |4j Moreover, we see that the 
pre-echo of the analysis at the bottom of figure [3] is completely removed in the 
adapted spectrogram. 

The main difference in the second version of our algorithm concerns the in- 
dividual frames composing the multi-frame, which have the same frequency step 
b but different time steps {ai : I € L}: the smallest and largest window sizes are 
given as parameters together with |L|, the number of different windows com- 
posing the multi-frame, and the global overlap needed for the analyses. The 
algorithm fixes the intermediate sizes so that, for each signal segment, the dif- 
ferent frames have the same overlap between consecutive windows, and so the 
same redundancy 

This choice highly reduces the computational cost by avoiding unnecessary small 
hop sizes for the larger windows, and as we have observed in the previous sec- 
tion it does not affect the entropy evaluation. Such a structure generates an 
irregular time disposition of the multi-frame elements in each signal segment, 
as illustrated in figure [5] in this way we also avoid the problem of unshared 
parts of signal between the systems, but we still have a different influence of the 
boundary parts depending on the analysis frame: the beginning and the end of 
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Fig. 3. Two different spectrograms of a B4 note played by a marimba, with Hanning 
windows of sizes 512 (top) and 4096 (bottom) samples. 





time 



Fig. 4. Example of an adaptive analysis performed by the first version of our algorithm 
with four Hanning windows of different sizes (512, 1024, 2048 and 4096 samples) on a 
B4 note played by a marimba: on top, the best window chosen as a function of time; at 
the bottom, the adaptive spectrogram. The entropy order is a = 0.7 and each analysis 
segment contains twenty-four analyses frames with a sixteen- frames overlap between 
consecutive segments. 



A Method for Local Time-adaptation of the Spectrogram 



13 



the signal segment have a higher energy when windowed in the smaller frames. 
This is avoided with a preliminary weighting: the beginning and the end of each 
signal segment are windowed respectively with the first and second half of the 
largest analysis window. 

As for the first implementation, the weighting does not concern the decomposi- 
tion for re-synthesis purpose, but only the analyses used for entropy evaluations. 
After the pre-weighting, the algorithm follows the same steps described above: 
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Fig. 5. An analysis segment: time locations of the Heisenberg boxes associated to the 
multi-frame used in the second version of our algorithm. 



calculation of the \L\ local spectrograms, evaluation of their entropy, selection of 
the window providing minimum entropy, computation of the adapted spectro- 
gram with the best window at each time point, thus creating an analysis with 
time- varying resolution and hop size. 

In figure [6] we give a first example of an adaptive analysis performed by the 
second version of our algorithm with eight Hanning windows of different sizes: 
the sound is still the B4 note of a marimba, and we can see that the two versions 
give very similar results. Thus, if the considered application does not specifically 
ask for a fixed hop size of the overall analysis, the second version is preferable 
as it highly reduces the computational cost without affecting the best window 
choice. 

In figure [8] we give a second example with a synthetic sound, a sinusoid 
with sinusoidal frequency modulation: as figure [7] shows, a small window is best 
adapted where the frequency variation is fast compared to the window length; on 
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Fig. 6. Example of an adaptive analysis performed by the second version of our algo- 
rithm with eight Hanning windows of different sizes from 512 to 4096 samples, on a 
B4 note played by a marimba sampled at 44.1kHz: on top, the best window chosen as a 
function of time; at the bottom, the adaptive spectrogram. The entropy order is a — 0.7 
and each analysis segment contains four frames of the largest window analysis with a 
two-frames overlap between consecutive segments. 
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the other hand, the largest window is better where the signal is almost stationary. 
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Fig. 7. Two different spectrograms of a sinusoid with sinusoidal frequency modulation, 
with Hanning windows of sizes 512 (top) and 4096 (bottom) samples. 



4.1 Re-synthesis Method 

The re-synthesis method introduced in [T2] gives a perfect reconstruction of the 
signal as a weighted expansion of the coefficients of its STFT in the original 
analysis frame. Let Sf[n, k] be the STFT of a signal / with window function h 
and time step a; fixing n, through an iFFT we have a windowed segment of / 

fh(n,l) = h(na-l)f(l) , (20) 

whose time location depends on n. An immediate perfect reconstruction of / is 
given by 

fm _ En^-pp h(ria ~ l)f h (n, I) 

m ~ Et^hHna-l) ■ (21) 

In our case, after the automatic selection step we dispose of a temporal sequence 
with the best windows at each time position; in the first version we have a fixed 
hop for all the windows, in the second one every window has its own time step. 
In both the cases we have thus reduced the initial multi-frame to a nonstationary 



Gabor frame: we extend the same technique of (21) using a variable window h 
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Fig. 8. Example of an adaptive analysis performed by the second version of our al- 
gorithm with eight Hanning windows of different sizes from 512 to 4096 samples, on 
a sinusoid with sinusoidal frequency modulation synthesized at 44-1 kHz: on top, the 
best window chosen as a function of time; at the bottom, the adaptive spectrogram. The 
entropy order is a — 0.7 and each analysis segment contains four frames of the largest 
window analysis with a three-frames overlap between consecutive segments. 
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and time step a according to the composition of the reduced multi-frame, ob- 
taining a perfect reconstruction as well. The interest of (21) is that the given 
distribution does not need to be the STFT of a signal: for example, a transfor- 
mation S*[n, k] of the STFT of a signal could be considered. In this case, (211 
gives the signal whose STFT has minimal least squares error with S*[n, k). 
As seen by the equations ([9| and (111, the theoretical existence and the mathe- 
matical definition of the canonical dual frame for a nonstationary Gabor frame 
like the one we employ has been provided |15) : it is thus possible to define the 
whole analysis and re-synthesis framework within the Gabor theory. We are at 
present working on the interesting analogies between the two approaches, to 
establish a unified interpretation and develop further extensions. 



5 Conclusions 

We have presented an algorithm for time-adaptation of the spectrogram reso- 
lution, which can be easily integrated in existent framework for analysis, trans- 
formation and re-synthesis of an audio signal: the adaptation is locally obtained 
through an entropy minimization within a finite set of resolutions, which can be 
defined by the user or left as default. The user can also specify the time duration 
and overlap of the analysis segments where entropy minimization is performed, 
to privilege more or less discontinuous adapted analyses. 

Future improvements of this method will concern the spectrogram adaptation 
in both time and frequency dimensions: this will provide a decomposition of the 
signal in several layers of analysis frames, thus requiring an extension of the 
proposed technique for re-synthesis. 
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